Large-scale similarity data management with distributed Metric Index
نویسندگان
چکیده
Metric space is a universal and versatile model of similarity that can be applied in various areas of non-text information retrieval. However, a general, efficient and scalable solution for metric data management is still a resisting research challenge. In this work, we try to make an important step towards such management system that would be able to scale to data collections of billions of objects. We propose a distributed index structure for similarity data management called the Metric Index (M-Index) which can answer queries in precise and approximate manner. This technique can take advantage of any distributed hash table that supports interval queries and utilize it as an underlying index. We have performed numerous experiments to test various settings of the M-Index structure and we have proved its usability by developing a full-featured publicly-available Web application. 2010 Elsevier Ltd. All rights reserved.
منابع مشابه
Resource Description and Selection for Range Query Processing in General Metric Spaces
Similarity search in general metric spaces is a key aspect in many application fields. Metric space indexing provides a flexible indexing paradigm and is solely based on the use of a distance metric. No assumption is made about the representation of the database objects. Nowadays, ever-increasing data volumes require large-scale distributed retrieval architectures. Here, local and global indexi...
متن کاملA Fuzzy Decision-Making Methodology for Risk Response Planning in Large-Scale Projects
Risk response planning is one of the main phases in the project risk management and has major impacts on the success of a large-scale project. Since projects are unique, and risks are dynamic through the life of the projects, it is necessary to formulate responses of the important risks. The conventional approaches tend to be less effective in dealing with the impreciseness of risk response p...
متن کاملMetric-Based Similarity Search in Unstructured Peer-to-Peer Systems
Peer-to-peer systems constitute a promising solution for deploying novel applications, such as distributed image retrieval. Efficient search over widely distributed multimedia content requires techniques for distributed retrieval based on generic metric distance functions. In this paper, we propose a framework for distributed metric-based similarity search, where each participating peer stores ...
متن کاملA Multi-Criteria Decision-Making Approach with Interval Numbers for Evaluating Project Risk Responses
The risk response development is one of the main phases in the project risk management that has major impacts on a large-scale project’s success. Since projects are unique, and risks are dynamic through the life of the projects, it is necessary to formulate responses of the important risks. Conventional approaches tend to be less effective in dealing with the imprecise of the risk response deve...
متن کاملLarge Scale Distributed Distance Metric Learning
In large scale machine learning and data mining problems with high feature dimensionality, the Euclidean distance between data points can be uninformative, and Distance Metric Learning (DML) is often desired to learn a proper similarity measure (using side information such as example data pairs being similar or dissimilar). However, high dimensionality and large volume of pairwise constraints i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inf. Process. Manage.
دوره 48 شماره
صفحات -
تاریخ انتشار 2012